Loan Data Exploration by Allan Visochek ## Investigate The following variables Borrower Attributes: >CreditGrade >CreditScoreRangeLower >EmploymentStatus >IsBorrowerHomeOwner >LoanMonthsSinceOrigination >StatedMonthlyIncome >IncomeRange >BorrowerState >Occupation >DebtToIncomeRatio Loan Attributes: >Term >BorrowerRate >LoanOriginalAmount ========================================================

```{r echo=FALSE, Load_the_Data}

Load the Data

setwd(‘Documents/data_science/p3/final_project/’) loanData<-read.csv(‘../data/prosperLoanData.csv’) loanData\(Term <-factor(loanData\)Term) loanData\(HasCreditGrade <- !(loanData\)CreditGrade==’’) loanData\(DebtLevel <- cut(loanData\)DebtToIncomeRatio,c(0,.3,.49,1,10.5)) loanData\(LoanPeriod <- cut(loanData\)LoanMonthsSinceOrigination,c(0,13,56,65,105)) loanData\(LoanPeriod2 <-cut(loanData\)LoanMonthsSinceOrigination,c(0,55,105),labels=c(“post-recession”,“pre-recession”)) loanData loanData\(BadCredit <-loanData\)CreditScoreRangeLower<600 loanData\(DebtLevelBucket <- cut(loanData\)DebtToIncomeRatio,c(0,0.49,1,10))

loanData\(LoanOriginalAmountBucket <- cut(loanData\)LoanOriginalAmount,seq(0,25000,1000)) #```

Univariate Plots Section

## Warning: position_stack requires constant width: output may be incorrect

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1340  0.1840  0.1928  0.2500  0.4975
##     10% 
## 0.09886
##    90% 
## 0.3099

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134
## False  True 
## 56459 57478
##          AK    AL    AR    AZ    CA    CO    CT    DC    DE    FL    GA 
##  5515   200  1679   855  1901 14717  2210  1627   382   300  6720  5008 
##    HI    IA    ID    IL    IN    KS    KY    LA    MA    MD    ME    MI 
##   409   186   599  5921  2078  1062   983   954  2242  2821   101  3593 
##    MN    MO    MS    MT    NC    ND    NE    NH    NJ    NM    NV    NY 
##  2318  2615   787   330  3084    52   674   551  3097   472  1090  6729 
##    OH    OK    OR    PA    RI    SC    SD    TN    TX    UT    VA    VT 
##  4197   971  1817  2972   435  1122   189  1737  6842   877  3278   207 
##    WA    WI    WV    WY 
##  3048  1842   391   150
##                                                        Accountant/CPA 
##                               3588                               3233 
##           Administrative Assistant                            Analyst 
##                               3688                               3602 
##                          Architect                           Attorney 
##                                213                               1046 
##                          Biologist                         Bus Driver 
##                                125                                316 
##                         Car Dealer                            Chemist 
##                                180                                145 
##                      Civil Service                             Clergy 
##                               1457                                196 
##                           Clerical                Computer Programmer 
##                               3164                               4478 
##                       Construction                            Dentist 
##                               1790                                 68 
##                             Doctor                Engineer - Chemical 
##                                494                                225 
##              Engineer - Electrical              Engineer - Mechanical 
##                               1125                               1406 
##                          Executive                            Fireman 
##                               4311                                422 
##                   Flight Attendant                       Food Service 
##                                123                               1123 
##            Food Service Management                          Homemaker 
##                               1239                                120 
##                           Investor                              Judge 
##                                214                                 22 
##                            Laborer                        Landscaping 
##                               1595                                236 
##                 Medical Technician                  Military Enlisted 
##                               1117                               1272 
##                   Military Officer                        Nurse (LPN) 
##                                346                                492 
##                         Nurse (RN)                       Nurse's Aide 
##                               2489                                491 
##                              Other                         Pharmacist 
##                              28617                                257 
##         Pilot - Private/Commercial  Police Officer/Correction Officer 
##                                199                               1578 
##                     Postal Service                          Principal 
##                                627                                312 
##                       Professional                          Professor 
##                              13628                                557 
##                       Psychologist                            Realtor 
##                                145                                543 
##                          Religious                  Retail Management 
##                                124                               2602 
##                 Sales - Commission                     Sales - Retail 
##                               3446                               2797 
##                          Scientist                      Skilled Labor 
##                                372                               2746 
##                      Social Worker         Student - College Freshman 
##                                741                                 41 
## Student - College Graduate Student           Student - College Junior 
##                                245                                112 
##           Student - College Senior        Student - College Sophomore 
##                                188                                 69 
##        Student - Community College         Student - Technical School 
##                                 28                                 16 
##                            Teacher                     Teacher's Aide 
##                               3759                                276 
##              Tradesman - Carpenter            Tradesman - Electrician 
##                                120                                477 
##               Tradesman - Mechanic                Tradesman - Plumber 
##                                951                                102 
##                       Truck Driver                    Waiter/Waitress 
##                               1675                                436
##           A    AA     B     C     D     E    HR 
## 29084 14551  5372 15581 18345 14274  9795  6935
##           A    AA     B     C     D     E    HR    NC 
## 84984  3315  3509  4389  5649  5153  3289  3508   141

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   660.0   680.0   685.6   720.0   880.0     591
##             $0      $100,000+      $1-24,999 $25,000-49,999 $50,000-74,999 
##            621          17337           7274          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806

## Warning: position_stack requires constant width: output may be incorrect

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3200    4667    5608    6825 1750000
## Warning: position_stack requires constant width: output may be incorrect

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

Univariate Analysis

What is the structure of your dataset?

There are 113,937 loans in the dataset with 81 features, 13 of which were used in the analysis:

Numerical Variables:

BorrowerRate CreditScoreRangeLower DebtToIncomeRatio LoanOriginalAmount StatedMonthlyIncome

Ordered factor variables

(from best to worst / greatest to least…)
Term:

60,36,12 >(note that term was a numerical variable but was transformed to a factor variable because it has very few values)

CreditGrade:

AA,A,B,C,D,E,HR,NC,none

ProsperRating..Alpha.:

AA,A,B,C,D,E,HR,none

IncomeRange:

$75,000-99,999 ; $50,000-74,999 ; $1-49,999 1-24,999 ; $0 ; Not employed; Not displayed

Unordered factor variables:

EmploymentStatus:

Employed, Full-time, Not employed, Part-time, Retired, Self-employed none,Not available, Other,

Occupation:

Accountant/CPA, Administrative Assistant, Analyst, Architect, Attorney, Biologist, Bus Driver, Car Dealer, Chemist, Civil Service, Clergy, Computer Programmer, Construction, Dentist, Doctor, Engineer - Chemical, Engineer - Electrical, Engineer - Mechanical, Executive, Fireman, Flight Attendant, Food Service, Food Service Management, Homemaker, Investor, Judge, Laborer, Landscaping, Medical Technician, Military Enlisted, Military Officer, Nurse (LPN), Nurse (RN), Nurse’s Aide, Other, Pharmacist, Pilot - Private/Commercial, Police Officer/Correction Officer, Postal Service, Principal, Professional, Professor, Psychologist, Realtor, Religious, Retail Management, Sales - Commission, Sales - Retail, Scientist, Skilled Labor, Social Worker, Student - College Freshman, Student - College Graduate Student, Student - College Junior, Student - College Senior, Student - College Sophomore, Student - Community College, Student - Technical School, Teacher, Teacher’s Aide, Tradesman - Carpenter, Tradesman - Electrician, Tradesman - Mechanic, Tradesman - Plumber, Truck Driver, Waiter/Waitress

BorrowerState:

AK, AL, AR, AZ, CA, CO, CT, DC, DE, FL, GA, HI, IA, ID, IL, IN, KS, KY, LA, MA, MD, ME, MI, MN, MO, MS, MT, NC, ND, NE, NH, NJ, NM, NV, NY, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WA, WI, WV, WY

Other observations:

The majority of loans go to individuals who are employed full time. Few loans are given out to individuals with low income, or who are unemployed. Most Borrowers do not have a credit grade. Loans amounts range from 0 to $35,000. 75% of loans are for under $12,000. Nearly all of the loans given out are 100% funded. Most loans have 0 net principal loss. Most borrowers have 0 delinquincies in the past 7 years. Few loans are given out from Q4 2008 through Q2 2009.

What is/are the main feature(s) of interest in your dataset?

The main features of interest are LoanOriginationQuarter and BorrowerRate.

What other features in the dataset do you think will help support your investigation into your feature(s) of interest?

It is hard to say at this point. All variables mentioned above were selected for the investigation because they are likely to have an impact on the borrower rate.

Did you create any new variables from existing variables in the dataset?

loanData\(BorrowerRateCategory<-cut(loanData\)BorrowerRate,c(0,0.1,0.31,0.5),labels=c(‘low’,‘normal’,‘high’))

loanData\(HasCreditGrade <- !(loanData\)CreditGrade==’’|is.na(loanData\(CreditGrade)) loanData\)HasProsperRating <- !(loanData\(ProsperRating..Alpha.==''|is.na(loanData\)ProsperRating..Alpha.))

HasIncome <-loanData$IncomeRange == “\(75,000-99,999"|loanData\)IncomeRange ==”\(50,000-74,999"|loanData\)IncomeRange == “\(25,000-49,999"|loanData\)IncomeRange ==”$1-24,999"

loanData\(DebtLevel <- cut(loanData\)DebtToIncomeRatio,c(0,.3,.49,1,10.5))

The following variables are created in later sections:

Of the features you investigated, were there any unusual distributions? Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?

debt to income…. stated monthly income…

I reordered the following factor variables:

ProsperScore..Alpha, CreditGrade, Incomerange, LoanOrginationQuarter

Bivariate Plots Section

## Term: 12
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0400  0.0929  0.1434  0.1501  0.2064  0.2669 
## -------------------------------------------------------- 
## Term: 36
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.1274  0.1815  0.1935  0.2599  0.4975 
## -------------------------------------------------------- 
## Term: 60
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0669  0.1490  0.1870  0.1930  0.2319  0.3304
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
## Returning the palette you asked for with that many colors

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning: Removed 8554 rows containing missing values (geom_point).

## Warning: Removed 8554 rows containing missing values (stat_summary).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning: Removed 591 rows containing missing values (geom_point).

## Warning: Removed 591 rows containing non-finite values (stat_boxplot).

## Warning: Removed 591 rows containing non-finite values (stat_boxplot).

## Warning: Removed 858 rows containing missing values (geom_point).

Bivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. How did the feature(s) of interest vary with other features in the dataset?

The average Borrower Rate was decreasing significantly from 2012 to 2014

The Lower 10% of borrower rates have remained relatively steady

The average Loan amount went down steeply in Q4 2008 and has been rising steadily since then.

it doesn’t look like there is a significant relationship from this graph…

It looks like there is a trend, although it is still not quite clear..

I’m going to make a new variable, DebtLevelBucket to make a smoother plot..

Borrower rate and debt level are correlated…

Credit score and ratings—-

Borrowers without a credit grade have borrower rate that is average relative to the others…

Borrower rates are cleanly distributed among the different prosper ratings. (in contrast to the credit grades which overlap…)

Borrowers without a prosper rating have a median borrower rate that is significantly lower than the others…

Credit score varies much less among different prosper ratings than it does with credit grades…

this is probably a different metric that the loan agency uses to distribute borrower rates…

Borrower Rate varies significantly by CreditGrade.

Borrower Rates seem to vary accross states quite a bit. The median borrower rate by state varies from approximately .15 to .2

Occupation has a strong influence on BorrowerRate. Occupations with a higher level of education (i.e. engineer, computer programmer, judge etc..) have borrower rates on the lower end of the spectrum while occupations with a lower level of education (i.e. college freshman, Nurse’s Aide, Bus Driver, Laborer), have BorrowerRates on the higher end of the spectrum. Median Borrower rate by occupation varies from approximately .125 to .225 ### Did you observe any interesting relationships between the other features (not the main feature(s) of interest)?

What was the strongest relationship you found?

The strongest Relationship was between the Borrower Rate and Credit Grade, although

Multivariate Plots Section

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 569 rows containing missing values (stat_summary).
## Warning: Removed 569 rows containing missing values (stat_summary).
## Warning: Removed 569 rows containing missing values (stat_summary).
## Warning: Removed 569 rows containing missing values (stat_summary).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## facet_wrap(LoanOriginationQuarter)
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning: Removed 591 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 315 rows containing missing values (stat_summary).
## Warning: Removed 254 rows containing missing values (stat_summary).
## Warning: Removed 22 rows containing missing values (stat_summary).
## Warning: Removed 315 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## 
##  Pearson's product-moment correlation
## 
## data:  BorrowerRate and LoanOriginalAmount
## t = -26.632, df = 28971, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.1658044 -0.1433251
## sample estimates:
##        cor 
## -0.1545848

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning: Removed 591 rows containing missing values (stat_summary).

## Warning: Removed 591 rows containing missing values (stat_summary).

## Warning: Removed 591 rows containing missing values (stat_summary).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 3 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).
## Warning: Removed 44 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 89 rows containing missing values (geom_point).
## Warning: Removed 101 rows containing missing values (geom_point).
## Warning: Removed 92 rows containing missing values (geom_point).
## Warning: Removed 286 rows containing missing values (geom_point).
## Warning: Removed 463 rows containing missing values (geom_point).
## Warning: Removed 67 rows containing missing values (geom_point).
## Warning: Removed 41 rows containing missing values (geom_point).
## Warning: Removed 157 rows containing missing values (geom_point).
## Warning: Removed 142 rows containing missing values (geom_point).
## Warning: Removed 194 rows containing missing values (geom_point).
## Warning: Removed 143 rows containing missing values (geom_point).
## Warning: Removed 181 rows containing missing values (geom_point).
## Warning: Removed 169 rows containing missing values (geom_point).
## Warning: Removed 276 rows containing missing values (geom_point).
## Warning: Removed 351 rows containing missing values (geom_point).
## Warning: Removed 508 rows containing missing values (geom_point).
## Warning: Removed 580 rows containing missing values (geom_point).
## Warning: Removed 571 rows containing missing values (geom_point).
## Warning: Removed 541 rows containing missing values (geom_point).
## Warning: Removed 328 rows containing missing values (geom_point).
## Warning: Removed 457 rows containing missing values (geom_point).
## Warning: Removed 440 rows containing missing values (geom_point).
## Warning: Removed 941 rows containing missing values (geom_point).
## Warning: Removed 905 rows containing missing values (geom_point).
## Warning: Removed 382 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 39 rows containing missing values (geom_point).
## Warning: Removed 97 rows containing missing values (geom_point).
## Warning: Removed 142 rows containing missing values (geom_point).
## Warning: Removed 180 rows containing missing values (geom_point).
## Warning: Removed 121 rows containing missing values (geom_point).
## Warning: Removed 146 rows containing missing values (geom_point).
## Warning: Removed 119 rows containing missing values (geom_point).
## Warning: Removed 309 rows containing missing values (geom_point).
## Warning: Removed 475 rows containing missing values (geom_point).
## Warning: Removed 69 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 43 rows containing missing values (geom_point).
## Warning: Removed 163 rows containing missing values (geom_point).
## Warning: Removed 149 rows containing missing values (geom_point).
## Warning: Removed 203 rows containing missing values (geom_point).
## Warning: Removed 145 rows containing missing values (geom_point).
## Warning: Removed 188 rows containing missing values (geom_point).
## Warning: Removed 177 rows containing missing values (geom_point).
## Warning: Removed 289 rows containing missing values (geom_point).
## Warning: Removed 372 rows containing missing values (geom_point).
## Warning: Removed 535 rows containing missing values (geom_point).
## Warning: Removed 615 rows containing missing values (geom_point).
## Warning: Removed 609 rows containing missing values (geom_point).
## Warning: Removed 581 rows containing missing values (geom_point).
## Warning: Removed 343 rows containing missing values (geom_point).
## Warning: Removed 485 rows containing missing values (geom_point).
## Warning: Removed 469 rows containing missing values (geom_point).
## Warning: Removed 945 rows containing missing values (geom_point).
## Warning: Removed 905 rows containing missing values (geom_point).
## Warning: Removed 415 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 44 rows containing missing values (geom_point).
## Warning: Removed 100 rows containing missing values (geom_point).
## Warning: Removed 144 rows containing missing values (geom_point).
## Warning: Removed 180 rows containing missing values (geom_point).
## Warning: Removed 123 rows containing missing values (geom_point).
## Warning: Removed 146 rows containing missing values (geom_point).
## Warning: Removed 120 rows containing missing values (geom_point).
## Warning: Removed 309 rows containing missing values (geom_point).
## Warning: Removed 475 rows containing missing values (geom_point).
## Warning: Removed 69 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 43 rows containing missing values (geom_point).
## Warning: Removed 163 rows containing missing values (geom_point).
## Warning: Removed 150 rows containing missing values (geom_point).
## Warning: Removed 203 rows containing missing values (geom_point).
## Warning: Removed 145 rows containing missing values (geom_point).
## Warning: Removed 188 rows containing missing values (geom_point).
## Warning: Removed 177 rows containing missing values (geom_point).
## Warning: Removed 293 rows containing missing values (geom_point).
## Warning: Removed 372 rows containing missing values (geom_point).
## Warning: Removed 535 rows containing missing values (geom_point).
## Warning: Removed 615 rows containing missing values (geom_point).
## Warning: Removed 609 rows containing missing values (geom_point).
## Warning: Removed 581 rows containing missing values (geom_point).
## Warning: Removed 344 rows containing missing values (geom_point).
## Warning: Removed 485 rows containing missing values (geom_point).
## Warning: Removed 469 rows containing missing values (geom_point).
## Warning: Removed 945 rows containing missing values (geom_point).
## Warning: Removed 905 rows containing missing values (geom_point).
## Warning: Removed 415 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 127 rows containing missing values (geom_point).
## Warning: Removed 423 rows containing missing values (geom_point).
## Warning: Removed 517 rows containing missing values (geom_point).
## Warning: Removed 578 rows containing missing values (geom_point).
## Warning: Removed 881 rows containing missing values (geom_point).
## Warning: Removed 898 rows containing missing values (geom_point).
## Warning: Removed 710 rows containing missing values (geom_point).
## Warning: Removed 754 rows containing missing values (geom_point).
## Warning: Removed 838 rows containing missing values (geom_point).
## Warning: Removed 1220 rows containing missing values (geom_point).
## Warning: Removed 1144 rows containing missing values (geom_point).
## Warning: Removed 181 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 193 rows containing missing values (geom_point).
## Warning: Removed 508 rows containing missing values (geom_point).
## Warning: Removed 501 rows containing missing values (geom_point).
## Warning: Removed 560 rows containing missing values (geom_point).
## Warning: Removed 516 rows containing missing values (geom_point).
## Warning: Removed 672 rows containing missing values (geom_point).
## Warning: Removed 771 rows containing missing values (geom_point).
## Warning: Removed 981 rows containing missing values (geom_point).
## Warning: Removed 1225 rows containing missing values (geom_point).
## Warning: Removed 1655 rows containing missing values (geom_point).
## Warning: Removed 1845 rows containing missing values (geom_point).
## Warning: Removed 2132 rows containing missing values (geom_point).
## Warning: Removed 2312 rows containing missing values (geom_point).
## Warning: Removed 1550 rows containing missing values (geom_point).
## Warning: Removed 2805 rows containing missing values (geom_point).
## Warning: Removed 3566 rows containing missing values (geom_point).
## Warning: Removed 5335 rows containing missing values (geom_point).
## Warning: Removed 4780 rows containing missing values (geom_point).
## Warning: Removed 1732 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 127 rows containing missing values (geom_point).
## Warning: Removed 423 rows containing missing values (geom_point).
## Warning: Removed 517 rows containing missing values (geom_point).
## Warning: Removed 578 rows containing missing values (geom_point).
## Warning: Removed 881 rows containing missing values (geom_point).
## Warning: Removed 898 rows containing missing values (geom_point).
## Warning: Removed 710 rows containing missing values (geom_point).
## Warning: Removed 754 rows containing missing values (geom_point).
## Warning: Removed 838 rows containing missing values (geom_point).
## Warning: Removed 1220 rows containing missing values (geom_point).
## Warning: Removed 1144 rows containing missing values (geom_point).
## Warning: Removed 181 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 193 rows containing missing values (geom_point).
## Warning: Removed 508 rows containing missing values (geom_point).
## Warning: Removed 501 rows containing missing values (geom_point).
## Warning: Removed 560 rows containing missing values (geom_point).
## Warning: Removed 516 rows containing missing values (geom_point).
## Warning: Removed 672 rows containing missing values (geom_point).
## Warning: Removed 771 rows containing missing values (geom_point).
## Warning: Removed 981 rows containing missing values (geom_point).
## Warning: Removed 1225 rows containing missing values (geom_point).
## Warning: Removed 1655 rows containing missing values (geom_point).
## Warning: Removed 1845 rows containing missing values (geom_point).
## Warning: Removed 2132 rows containing missing values (geom_point).
## Warning: Removed 2312 rows containing missing values (geom_point).
## Warning: Removed 1550 rows containing missing values (geom_point).
## Warning: Removed 2805 rows containing missing values (geom_point).
## Warning: Removed 3566 rows containing missing values (geom_point).
## Warning: Removed 5335 rows containing missing values (geom_point).
## Warning: Removed 4780 rows containing missing values (geom_point).
## Warning: Removed 1732 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in scale$trans$trans(x): NaNs produced
## Warning: Removed 38 rows containing missing values (geom_point).
## Warning: Removed 133 rows containing missing values (geom_point).
## Warning: Removed 156 rows containing missing values (geom_point).
## Warning: Removed 242 rows containing missing values (geom_point).
## Warning: Removed 333 rows containing missing values (geom_point).
## Warning: Removed 363 rows containing missing values (geom_point).
## Warning: Removed 241 rows containing missing values (geom_point).
## Warning: Removed 304 rows containing missing values (geom_point).
## Warning: Removed 291 rows containing missing values (geom_point).
## Warning: Removed 512 rows containing missing values (geom_point).
## Warning: Removed 605 rows containing missing values (geom_point).
## Warning: Removed 94 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 64 rows containing missing values (geom_point).
## Warning: Removed 209 rows containing missing values (geom_point).
## Warning: Removed 205 rows containing missing values (geom_point).
## Warning: Removed 255 rows containing missing values (geom_point).
## Warning: Removed 203 rows containing missing values (geom_point).
## Warning: Removed 273 rows containing missing values (geom_point).
## Warning: Removed 303 rows containing missing values (geom_point).
## Warning: Removed 433 rows containing missing values (geom_point).
## Warning: Removed 537 rows containing missing values (geom_point).
## Warning: Removed 771 rows containing missing values (geom_point).
## Warning: Removed 867 rows containing missing values (geom_point).
## Warning: Removed 940 rows containing missing values (geom_point).
## Warning: Removed 937 rows containing missing values (geom_point).
## Warning: Removed 574 rows containing missing values (geom_point).
## Warning: Removed 896 rows containing missing values (geom_point).
## Warning: Removed 1010 rows containing missing values (geom_point).
## Warning: Removed 1707 rows containing missing values (geom_point).
## Warning: Removed 1590 rows containing missing values (geom_point).
## Warning: Removed 705 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in scale$trans$trans(x): NaNs produced
## Warning: Removed 64 rows containing missing values (geom_point).
## Warning: Removed 202 rows containing missing values (geom_point).
## Warning: Removed 79 rows containing missing values (geom_point).
## Warning: Removed 93 rows containing missing values (geom_point).
## Warning: Removed 79 rows containing missing values (geom_point).
## Warning: Removed 100 rows containing missing values (geom_point).
## Warning: Removed 373 rows containing missing values (geom_point).
## Warning: Removed 54 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 38 rows containing missing values (geom_point).
## Warning: Removed 135 rows containing missing values (geom_point).
## Warning: Removed 140 rows containing missing values (geom_point).
## Warning: Removed 185 rows containing missing values (geom_point).
## Warning: Removed 123 rows containing missing values (geom_point).
## Warning: Removed 163 rows containing missing values (geom_point).
## Warning: Removed 152 rows containing missing values (geom_point).
## Warning: Removed 246 rows containing missing values (geom_point).
## Warning: Removed 308 rows containing missing values (geom_point).
## Warning: Removed 422 rows containing missing values (geom_point).
## Warning: Removed 489 rows containing missing values (geom_point).
## Warning: Removed 511 rows containing missing values (geom_point).
## Warning: Removed 472 rows containing missing values (geom_point).
## Warning: Removed 304 rows containing missing values (geom_point).
## Warning: Removed 393 rows containing missing values (geom_point).
## Warning: Removed 402 rows containing missing values (geom_point).
## Warning: Removed 746 rows containing missing values (geom_point).
## Warning: Removed 679 rows containing missing values (geom_point).
## Warning: Removed 330 rows containing missing values (geom_point).

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 51 rows containing missing values (geom_point).
## Warning: Removed 56 rows containing missing values (geom_point).
## Warning: Removed 121 rows containing missing values (geom_point).
## Warning: Removed 163 rows containing missing values (geom_point).
## Warning: Removed 190 rows containing missing values (geom_point).
## Warning: Removed 138 rows containing missing values (geom_point).
## Warning: Removed 158 rows containing missing values (geom_point).
## Warning: Removed 127 rows containing missing values (geom_point).
## Warning: Removed 321 rows containing missing values (geom_point).
## Warning: Removed 482 rows containing missing values (geom_point).
## Warning: Removed 72 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 44 rows containing missing values (geom_point).
## Warning: Removed 165 rows containing missing values (geom_point).
## Warning: Removed 153 rows containing missing values (geom_point).
## Warning: Removed 206 rows containing missing values (geom_point).
## Warning: Removed 148 rows containing missing values (geom_point).
## Warning: Removed 192 rows containing missing values (geom_point).
## Warning: Removed 179 rows containing missing values (geom_point).
## Warning: Removed 294 rows containing missing values (geom_point).
## Warning: Removed 376 rows containing missing values (geom_point).
## Warning: Removed 539 rows containing missing values (geom_point).
## Warning: Removed 619 rows containing missing values (geom_point).
## Warning: Removed 615 rows containing missing values (geom_point).
## Warning: Removed 587 rows containing missing values (geom_point).
## Warning: Removed 346 rows containing missing values (geom_point).
## Warning: Removed 487 rows containing missing values (geom_point).
## Warning: Removed 472 rows containing missing values (geom_point).
## Warning: Removed 946 rows containing missing values (geom_point).
## Warning: Removed 907 rows containing missing values (geom_point).
## Warning: Removed 423 rows containing missing values (geom_point).

Multivariate Analysis

Talk about some of the relationships you observed in this part of the investigation. Were there features that strengthened each other in terms of looking at your feature(s) of interest?

Were there any interesting or surprising interactions between features?

I discovered that the prosper ratings

OPTIONAL: Did you create any models with your dataset? Discuss the strengths and limitations of your model.


Final Plots and Summary

Plot One

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated
## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

Description One

Plot Two

Description Two

Plot Three

## Warning in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels)
## else paste0(labels, : duplicated levels in factors are deprecated

Description Three

Up until 2009, the BorrowerRate was relatively well corellated to the credit score, or credit grade and the loan original amount. …..

Reflection

This prosper loan data is a rich dataset that reveals tons of information about the distribution of loans over time.